Information Theoretic Optimization of Audio Features for Multimodal Speaker Detection

نویسندگان

  • Murat Kunt
  • Patricia Besson
چکیده

We present a method that exploits the information theoretic framework described in [1] to extract optimal audio features with respect to the video features. A simple measure of mutual information between the resulting audio features and the video ones allows to detect the active speaker among different candidates. The results show that our method is able to exploit the shared speech information contained in audio and video signals to recover their common source.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of audio features specific to speech in multimodal speaker detection

We present a method that exploits an information theoretic framework to extract optimized audio features using video information. A simple measure of mutual information (MI) between the resulting audio features and the video ones allows to detect the active speaker among different candidates. Our method involves the optimization of an MI-based objective function. No approximation is introduced ...

متن کامل

Hypothesis testing as a performance evaluation method for multimodal speaker detection

This work addresses the problem of detecting the speaker on audiovisual sequences by evaluating the synchrony between the audio and video signals. Prior to the classification, an information theoretic framework is applied to extract optimized audio features using video information. The classification step is then defined through a hypothesis testing framework so as to get confidence levels asso...

متن کامل

PSO Based Optimized Reliability for Robust Multimodal Speaker Identification

Speaker recognition in real environment with reliable mode is a key challenge for ubiquitous service in human computer interface. In this paper, we present a robust multimodal speaker identification system with optimized reliability of different modalities. We propose an extension of modified convection function’s optimizing factors to account optimum reliability simultaneously in audio, face a...

متن کامل

Discrimination Analysis of Lip Motion Features for Multimodal Speaker Identification and Speech-reading

In this thesis a new multimodal speaker/speech recognition system that integrates audio, lip texture, lip geometry, and lip motion modalities is presented. There have been several studies that jointly use audio, lip intensity and/or lip geometry information for speaker identification and speech-reading applications. This work proposes using explicit lip motion information, instead of or in addi...

متن کامل

Using Weighted Oriented Optical Flow Histograms for Multimodal Speaker Diarization

Speaker diarization currently focuses on using audio features to partition an audio stream into speaker homogeneous speech regions, in other words to determine “who spoke when”. Recent speaker diarization corpora contains video recordings in addition to the commonly used audio. Thus, we investigated the benefits of incorporating video features, namely histograms of weighted oriented optical flo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005